Search CORE

102 research outputs found

Distributionally Robust Optimization for Sequential Decision Making

Author: Chen Zhi
Haskell William B.
Yu Pengqian
Publication venue
Publication date: 09/10/2018
Field of study

The distributionally robust Markov Decision Process (MDP) approach asks for a distributionally robust policy that achieves the maximal expected total reward under the most adversarial distribution of uncertain parameters. In this paper, we study distributionally robust MDPs where ambiguity sets for the uncertain parameters are of a format that can easily incorporate in its description the uncertainty's generalized moment as well as statistical distance information. In this way, we generalize existing works on distributionally robust MDP with generalized-moment-based and statistical-distance-based ambiguity sets to incorporate information from the former class such as moments and dispersions to the latter class that critically depends on empirical observations of the uncertain parameters. We show that, under this format of ambiguity sets, the resulting distributionally robust MDP remains tractable under mild technical conditions. To be more specific, a distributionally robust policy can be constructed by solving a sequence of one-stage convex optimization subproblems

arXiv.org e-Print Archive

ScholarBank@NUS

Model and Reinforcement Learning for Markov Games with Risk Preferences

Author: Hai Pham Viet
Haskell William B.
Huang Wenjie
Publication venue
Publication date: 21/11/2019
Field of study

We motivate and propose a new model for non-cooperative Markov game which considers the interactions of risk-aware players. This model characterizes the time-consistent dynamic "risk" from both stochastic state transitions (inherent to the game) and randomized mixed strategies (due to all other players). An appropriate risk-aware equilibrium concept is proposed and the existence of such equilibria is demonstrated in stationary strategies by an application of Kakutani's fixed point theorem. We further propose a simulation-based Q-learning type algorithm for risk-aware equilibrium computation. This algorithm works with a special form of minimax risk measures which can naturally be written as saddle-point stochastic optimization problems, and covers many widely investigated risk measures. Finally, the almost sure convergence of this simulation-based algorithm to an equilibrium is demonstrated under some mild conditions. Our numerical experiments on a two player queuing game validate the properties of our model and algorithm, and demonstrate their worth and applicability in real life competitive decision-making.Comment: 38 pages, 6 tables, 5 figure

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

An Inexact Primal-Dual Smoothing Framework for Large-Scale Non-Bilinear Saddle Point Problems

Author: Haskell William B.
Hien Le Thi Khanh
Zhao Renbo
Publication venue
Publication date: 09/07/2020
Field of study

We develop an inexact primal-dual first-order smoothing framework to solve a class of non-bilinear saddle point problems with primal strong convexity. Compared with existing methods, our framework yields a significant improvement over the primal oracle complexity, while it has competitive dual oracle complexity. In addition, we consider the situation where the primal-dual coupling term has a large number of component functions. To efficiently handle this situation, we develop a randomized version of our smoothing framework, which allows the primal and dual sub-problems in each iteration to be solved by randomized algorithms inexactly in expectation. The convergence of this framework is analyzed both in expectation and with high probability. In terms of the primal and dual oracle complexities, this framework significantly improves over its deterministic counterpart. As an important application, we adapt both frameworks for solving convex optimization problems with many functional constraints. To obtain an

\varepsilon

-optimal and

\varepsilon

-feasible solution, both frameworks achieve the best-known oracle complexities (in terms of their dependence on

\varepsilon

)

arXiv.org e-Print Archive